Utilization of ordinal response structures in classification with high-dimensional expression data
نویسندگان
چکیده
Molecular diagnosis or prediction of clinical treatment outcome based on high-throughput genomics data is a modern application of machine learning techniques for clinical problems. In practice, clinical parameters, such as patient health status or toxic reaction to therapy, are often measured on an ordinal scale (e.g. good, fair, poor). Commonly, the prediction of ordinal end-points is treated as a multi-class classification problem, disregarding the ordering information contained in the response. This may result in a loss of prediction accuracy. Classical approaches to model ordinal response directly, including for instance the cumulative logit model, are typically not applicable to high-dimensional data. We present hierarchical twoing (hi2), a novel algorithm for classification of high-dimensional data into ordered categories. hi2 combines the power of well-understood binary classification with ordinal response prediction. A comparison of several approaches for ordinal classification on real world data as well as simulated data shows that classification algorithms especially designed to handle ordered categories fail to improve upon state-of-the-art non-ordinal classification algorithms. In general, the classification performance of an algorithm is dominated by its ability to deal with the high-dimensionality of the data. Only hi2 outperforms its competitors in the case of moderate effects. 1998 ACM Subject Classification I.5.2 Design Methodology
منابع مشابه
Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملModelling of Correlated Ordinal Responses, by Using Multivariate Skew Probit with Different Types of Variance Covariance Structures
In this paper, a multivariate fundamental skew probit (MFSP) model is used to model correlated ordinal responses which are constructed from the multivariate fundamental skew normal (MFSN) distribution originate to the greater flexibility of MFSN. To achieve an appropriate VC structure for reaching reliable statistical inferences, many types of variance covariance (VC) structures are considered ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملModeling Paired Ordinal Response Data
About 25 years ago, McCullagh proposed a method for modeling univariate ordinal responses. After publishing this paper, other statisticians gradually extended his method, such that we are now able to use more complicated but efficient methods to analyze correlated multivariate ordinal data, and model the relationship between these responses and host of covariates. In this paper, we aim to...
متن کاملStatistical Methods to Enhance Clinical Prediction with High-Dimensional Data and Ordinal Response
Advancing technology has enabled us to study the molecular configuration of single cells or whole tissue samples. Molecular biology produces vast amounts of high-dimensional omics data at continually decreasing costs, so that molecular screens are increasingly often used in clinical applications. Personalized diagnosis or prediction of clinical treatment outcome based on high-throughput omics d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013